Data Quality on KDD: a Real-life Scenario
نویسندگان
چکیده
Abstract. The growing di↵usion of IT-based services generates a lot of data useful for supporting the activities of firms, organisations, and state agencies. In such a context, data quality tasks are frequently addressed using cleansing routines, often framed in the wider context of ETL processes (Extraction, Transformation, and Loading). The design of these cleansing routines often relies on the experience of domain-experts, and this makes the evaluation of the quality level achieved a relevant concern to ensure the believability of the analysed results. In this paper we describe two model based techniques aimed at respectively evaluating the consistency of a dataset and at identifying the cleansing alternatives. The techniques have been applied on a real-world dataset derived from the Italian labour market domain, which we made publicly available to the community.
منابع مشابه
Data Quality Mining
In this paper we introduce data quality mining (DQM) as a new and promising data mining approach from the academic and the business point of view. The goal of DQM is to employ data mining methods in order to detect, quantify, explain and correct data quality deficiencies in very large databases. Data quality is crucial for many applications of knowledge discovery in databases (KDD). So a typica...
متن کاملData Quality Mining - Making a Virute of Necessity
In this paper we introduce data quality mining (DQM) as a new and promising data mining approach from the academic and the business point of view. The goal of DQM is to employ data mining methods in order to detect, quantify, explain and correct data quality deficiencies in very large databases. Data quality is crucial for many applications of knowledge discovery in databases (KDD). So a typica...
متن کاملCategorization of Association Rule Mining Algorithms
More and more computer science scholars and researchers, especially those who specialize in the field of Knowledge Discovery in Data (KDD), focus and emphasis on Association Rule Mining (ARM). ARM, arguably, is one of the most researched areas in KDD, addresses the problem of discovering association rules between items / attributes in very large databases. A number of significant ARM algorithms...
متن کاملRough Set Approach to KDD
This tutorial is a survey on rough set theory and some of its applications in Knowledge Discovery from Databases (KDD). It will also cover the practice guide to analysis of different real life problems using rough set methods as well as the presentation of Rough Set Exploration System (RSES) what can be treated as a preliminary material for the main conference and associated workshops.
متن کاملTechnical and Scientific Issues of KDD (or: Is KDD a Science?)
It has been already largely proven that Knowledge Discovery in Databases (KDD) is an interesting new research field, able to provide financial returns to the companies that are willing to invest into it. This fact demonstrates the excellent social value of KDD. A Science, however, is not uniquely defined by this feature. It needs also to show an internal logic, due to a specific approach to the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014